منابع مشابه
On Contrastive Divergence Learning
Maximum-likelihood (ML) learning of Markov random fields is challenging because it requires estimates of averages that have an exponential number of terms. Markov chain Monte Carlo methods typically take a long time to converge on unbiased estimates, but Hinton (2002) showed that if the Markov chain is only run for a few steps, the learning can still work well and it approximately minimizes a d...
متن کاملDivergence Function, Information Monotonicity and Information Geometry
A divergence function measures how different two points are in a base space. Well-known examples are the Kullback-Leibler divergence and f-divergence, which are defined in a manifold of probability distributions. The Bregman divergence is used in a more general situation. The present paper characterizes the geometrical structure which a divergence function gives, and proves that the fdivergence...
متن کاملLower bounds on Information Divergence
In this paper we establish lower bounds on information divergence from a distribution to certain important classes of distributions as Gaussian, exponential, Gamma, Poisson, geometric, and binomial. These lower bounds are tight and for several convergence theorems where a rate of convergence can be computed, this rate is determined by the lower bounds proved in this paper. General techniques fo...
متن کاملInformation geometry of divergence functions
Measures of divergence between two points play a key role in many engineering problems. One such measure is a distance function, but there are many important measures which do not satisfy the properties of the distance. The Bregman divergence, KullbackLeibler divergence and f -divergence are such measures. In the present article, we study the differential-geometrical structure of a manifold ind...
متن کاملBounding the Bias of Contrastive Divergence Learning
Optimization based on k-step contrastive divergence (CD) has become a common way to train restricted Boltzmann machines (RBMs). The k-step CD is a biased estimator of the log-likelihood gradient relying on Gibbs sampling. We derive a new upper bound for this bias. Its magnitude depends on k, the number of variables in the RBM, and the maximum change in energy that can be produced by changing a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Pattern Analysis and Machine Intelligence
سال: 2015
ISSN: 0162-8828,2160-9292
DOI: 10.1109/tpami.2014.2366144